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(54) Method and apparatus for reducing leakage power in a cache memory 



(57) A method and apparatus are disclosed for re- 
ducing leakage power in a cache memory. A cache de- 
cay technique is employed for both data and instruction 
caches that removes power from cache lines that have 
not been accessed for a predefined time interval, re- 
ferred to as the decay interval. The cache-line granular- 
ity of the present invention permits a significant reduc- 
tion in leakage power while at the same time preserving 
much of the performance of the cache. The decay inter- 
val is maintained using a timer that is reset each time 
the corresponding cache line is accessed. The decay 



interval may be fixed or variable. Once the decay inter- 
val timer exceeds a specified decay interval, power to 
the cache line is removed. Once power to the cache line 
is removed, the contents of the data and tag fields are 
allowed to decay and the valid bit associated with the 
cache line is reset. When a cache line is later accessed 
after being powered down by the present invention, a 
cache miss is incurred while the cache line is again pow- 
ered up and the data is obtained from the next level of 
the memory hierarchy. 
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Description 

Cross-Reference to Related Applications 

[0001] This application claims the benefit of United 
States Provisional Application Number 60/243,1 73, filed 
October 25, 2000. 

Field of the Invention 

[0002] The present invention relates generally to 
cache memory devices, and more particularly, to meth- 
ods and apparatus for reducing the leakage power in 
such cache memories. 

Background of the Invention 

[0003] Cache memories reduce memory access 
times of large external memories. FIG. 1 illustrates a 
conventional cache architecture where a cache memory 
120 is inserted between one or more processors 110 
and a main memory 130. Generally, the main memory 
130 is relatively large and slow compared to the cache 
memory 120. The cache memory 120 contains a copy 
of portions of the main memory 130. When the proces- 
sor 110 attempts to read an area of memory, a check is 
performed to determine if the memory contents are al- 
ready in the cache memory 1 20. If the memory contents 
are in the cache memory 120 (a cache "hit"), the con- 
tents are delivered directly to the processor 1 1 0. If, how- 
ever, the memory contents are not in the cache memory 
1 20 (a cache "miss"), a block of main memory 1 30, con- 
sisting of some fixed number of words, is typically read 
into the cache memory 120 and thereafter delivered to 
the processor 110. 

[0004] Cache memories 120 are often implemented 
using CMOS technology. To achieve lower power and 
higher performance in CMOS devices, however, there 
is an increasing trend to reduce the drive supply voltage 
(V dd ) of the CMOS devices. To maintain performance, 
a reduction in the drive supply voltage necessitates a 
reduction in the threshold voltage (V th ), which in turn in- 
creases leakage power dissipation exponentially. Since 
chip transistor counts continue to increase, and every 
transistor that has power applied will leak irrespective 
of its switching activity, leakage power is expected to 
become a significant factor in the total power dissipation 
of a chip. It has been estimated that the leakage power 
dissipated by a chip could equal the dynamic power of 
the chip within three processor generations. 
[0005] One solution for reducing leakage power is to 
power down unused devices. M.D. Powell etal., "Gated- 
V dd : A Circuit Technique to Reduce Leakage in Deep- 
Submicron Cache Memories," ACM/IEEE International 
Symposium on Low Power Electronics and Design (IS- 
LPED) (2000) and Se-Hyun Yang et al., "An Integrated 
Circuit/Architecture Approach to Reducing Leakage in 
Deep-Submicron High-Performance l-Caches," ACM/ 



IEEE International Symposium on High-Performance 
Computer Architecture (HPCA) (Jan. 2001) propose a 
micro-architectural technique referred to as a dynami- 
cally resizable instruction (DRI) cache and a gated-V dd 
5 circuit-level technique, respectively, to reduce power 
leakage in static random access memory (SRAM) cells 
by turning off power to large blocks of the instruction 
cache. 

[0006] While the techniques disclosed in M.D. Powell 
10 et al. and Se-Hyun Yang et al. reduce leakage power for 
instruction caches, a need exists for a method and ap- 
paratus for reducing leakage power in both instruction 
and data cache memories. A further need exists for a 
method and apparatus for reducing leakage power in 
15 cache memories that can remove the power of individual 
cache lines. Yet another need exists for a cache memory 
with reduced power consumption. 

Summary of the Invention 

20 

[0007] Generally, a method and apparatus are dis- 
closed for reducing leakage power in a cache memory. 
The present invention removes power from cache lines 
that have been inactive for some period of time assum- 
es jng that these cache lines are unlikely to be accessed 
in the future. A cache decay technique is employed that 
removes power from cache lines that have not been ac- 
cessed for a predefined time interval, referred to as the 
decay interval. The cache decay techniques disclosed 
30 herein reduce leakage power dissipation in cache mem- 
ories and thus yield cache memories with reduced pow- 
er consumption. The cache-line granularity of the 
present invention permits a significant reduction in leak- 
age power while at the same time preserving much of 
35 the performance of the cache. The cache decay tech- 
niques of the present invention can be successfully ap- 
plied to both data and instruction caches, to set-associ- 
ative caches and to multilevel cache hierarchies. 
[0008] The decay interval is maintained using a timer 
40 that is reset each time the corresponding cache line is 
accessed. The decay interval may be variable to permit 
dynamic adjustments to the decay interval to allow the 
decay interval to be increased or decreased as desired 
to increase performance or save power, respectively. If 
45 the decay interval timer exceeds the specified decay in- 
terval, power to the cache line is removed. Once power 
to the cache line is removed, the contents of the data 
field, and the tag field are allowed to degrade (possibly 
lose their logical values) while the valid bit associated 
50 with the cache line is reset. When a cache line is later 
accessed after being powered down by the present in- 
vention, a cache miss is incurred (because the valid bit 
has been reset) while the cache line is again powered 
up and the data is obtained from the next level of the 
55 memory hierarchy. 

[0009] According to one aspect of the present inven- 
tion there is provided a cache memory, comprising a plu- 
rality of cache lines for storing a value from main mem- 
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ory; and a timer associated with each of said plurality of 
cache lines, each of said timers configured to control a 
signal that removes power to said associated cache line 
after a decay interval. A timer associated with a given 
cache line may be reset each time said associated 
cache line is accessed. Said decay interval may be var- 
iable. Said variable decay interval may be increased to 
increase performance. Said variable decay interval may 
be lowered to said power. Said variable decay interval 
may be implemented by adjusting a reference value in 
a comparator. Said variable decay interval may be im- 
plemented by adjusting a bias of a comparator. 
[0010] Said timer may be a k bit timer and said timer 
receives a tick from a global N-bit counter where k is 
less than N. Said timer may be a k bit timer and said 
timer may receive a tick from any source. Said timer may 
be any k-state finite state machine (FSM) that can func- 
tion logically as a counter. 

[0011] The cache memory may further comprise a 
dirty bit associated with each of said cache lines to in- 
dicate when a contents of said cache line must be writ- 
ten back to main memory before said power is removed 
from said associated cache line after said decay inter- 
val. One or more of said timers associated with said plu- 
rality of cache lines may be cascaded to distribute said 
writing back to main memory over time. 
[0012] Removing power from said associated cache 
line may reset a valid field associated with said cache 
line. Said signal may be further configured to remove a 
potential from said cache line. A first access to a cache 
line that has been powered down may result in a cache 
miss, may reset said corresponding timer and may re- 
store power to said cache line. A first access to a cache 
line that has been powered down may be delayed by a 
period of time that permits said cache line to stabilize 
after power is restored. Said timer may be an analog 
device that detects a predefined voltage on said device 
corresponding to said decay interval. Said cache mem- 
ory may be a multilevel cache memory, and may further 
comprise one or more inclusion bits associated with 
each of said cache lines, said inclusion bits indicating 
whether data stored in said cache line exists in upper 
levels and wherein said power is removed only from a 
data field associated with said cache line after said de- 
cay interval when said inclusion bits indicate the exist- 
ence of the same data in an upper level. Said cache 
memory may be a multilevel cache memory, and where- 
in said power is removed from a cache line after said 
decay interval only if said power is removed in corre- 
sponding upper levels of said multilevel cache memory. 
[0013] According to another aspect of the present in- 
vention there is provided a method for reducing leakage 
power in a cache memory, said cache memory having 
a plurality of cache lines, said method comprising the 
steps of providing a timer for each of said cache lines; 
resetting said timer each time said cache line is ac- 
cessed; and removing power from said associated 
cache line after a decay interval. Said decay interval 



may be variable. Said timer may be a k-bit timer and 
said timer receives a tick from a global N-bit counter 
where k is less than N. The method further comprises 
the step of evaluating a dirty bit associated with each of 

5 said cache lines that indicates when a contents of said 
cache line must be written back to main memory before 
said power is removed from said associated cache line 
after said decay interval. Said step of removing power 
from said associated cache line may further comprise 

10 the step of resetting a valid field associated with said 
cache line. A first access to a cache line that has been 
powered down may further comprise the steps of estab- 
lishing a cache miss, resetting said corresponding timer 
and restoring power to said cache line. A first access to 

15 a cache line that has been powered down may further 
comprise the step of delaying said access by an appro- 
priate amount of time until said cache line stabilizes after 
power is restored. 

[0014] According to a further aspect of the present in- 

20 vention there is provided a cache memory, comprising 
a plurality of cache lines for storing a value from main 
memory, each of said cache lines comprised of one or 
more dynamic random access memory (DRAM) cells, 
each of said DRAM cells being refreshed each time said 

25 cache line is accessed, each of said DRAM cells reliably 
storing said value for a safe period; and a timer associ- 
ated with each of said plurality of cache lines, each of 
said timers controlling a signal that resets a valid bit as- 
sociated with said cache line after said safe period. Said 

30 safe period may be established based on characteristics 
of an integrated circuit (IC) process used to manufacture 
said DRAM cells. Said DRAM cells may be embodied 
as 4-T DRAM cells. Said timer may be a k bit timer and 
said timer receives a tick from a global N-bit counter 

35 where k is less than N. Said timer may be a k bit timer 
and said timer receives a tick from any source. Said tim- 
er may be any k-state finite state machine (FSM) that 
can function logically as a counter. 
[0015] The cache memory may further comprise a 

40 dirty bit associated with each of said cache lines to in- 
dicate when a contents of said cache line must be writ- 
ten back to main memory before said power is removed 
from said associated cache line after said decay inter- 
val. 

45 [0016] According to a still further aspect of the present 
invention there is provided a method for reducing leak- 
age power in a cache memory, said cache memory hav- 
ing a plurality of cache lines, for storing a value from 
main memory, each of said cache lines comprised of one 

50 or more dynamic random access memory (DRAM) cells, 
each of said DRAM cells reliably storing said value for 
a safe period, said method comprising the steps of re- 
freshing each of said DRAM cells each time said corre- 
sponding cache line is accessed; and providing a timer 

55 for each of said cache lines; resetting said timer each 
time said cache line is accessed; and resetting a valid 
bit associated with said cache line after said safe period. 
Said safe period may be established based on charac- 
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teristics of an integrated circuit (IC) process used to 
manufacture said DRAM cells. 

[001 7] A more complete understanding of the present 
invention, as well as further features and advantages of 
the present invention, will be obtained by reference to 
the following detailed description and drawings. 

Brief Description of the Drawings 

[0018] 

FIG. 1 illustrates a conventional cache architecture; 
FIG. 2 illustrates the structure of the conventional 
cache memory of FIG. 1 in further detail; 
FIG. 3 illustrates a digital implementation of a cache 
memory in accordance with the present invention; 
FIG. 4 illustrates exemplary circuitry for each cache 
line in the exemplary digital implementation of FIG. 
3; 

FIG. 5 provides a state diagram for the exemplary 
two-bit counter of FIGS. 3 and 4; 
FIG. 6 is a block diagram illustrating cache-line 
power control aspects of the present invention; 
FIG. 7 illustrates an analog implementation of a de- 
cay counter for a cache memory in accordance with 
the present invention; 

FIG. 8 illustrates the structure of an alternative 
cache memory in accordance with the present in- 
vention; 

FIG. 9 illustrates a 4-transistor hybrid SRAM/DRAM 
memory cell without a path to ground; and 
FIG. 10 illustrates a state diagram for a one-bit 
counter used to decay 4T-DRAM cache lines. 

Detailed Description 

[0019] FIG. 2 illustrates the structure of the conven- 
tional cache memory 120 of FIG. 1 in further detail. As 
shown in FIG. 2, the cache memory 120 consists of C 
cache lines of K words each. The number of lines in the 
cache memory 120 is generally considerably less than 
the number of blocks in main memory 130. At any time, 
a portion of the blocks of main memory 130 resides in 
lines of the cache memory 120. An individual line in the 
cache memory 120 cannot be uniquely dedicated to a 
particular block of the main memory 130. Thus, as 
shown in FIG. 2, each cache line includes a tag indicat- 
ing which particular block of main memory 130 is cur- 
rently stored in the cache 120. In addition, each cache 
line includes a valid bit indicating whether the stored da- 
ta is valid. 

[0020] The present invention provides a cache decay 
technique that removes power from cache lines that 
have not been accessed for a predefined time interval, 
referred to as the decay interval. The decay interval may 
be variable to permit dynamic adjustments to the decay 
interval to allow the decay interval to be increased or 
decreased as desired to increase performance or save 



power, respectively. 

[0021 ] The cache decay techniques described herein 
reduce leakage power dissipation in caches. The power 
to a cache line that has not been accessed within a de- 
5 cay interval is turned off. When a cache line is thereafter 
accessed that has been powered down by the present 
invention, a cache miss is incurred while the line is pow- 
ered up and the data is fetched from the next level of 
the memory hierarchy. It is noted that the present inven- 
10 tion may be employed in any level of the memory hier- 
archy, as would be apparent to a person of ordinary skill 
in the art. By controlling power using such cache-line 
granularity, the present invention achieves a significant 
reduction in leakage power while at the same time pre- 
15 serving much of the performance of the cache. A decay 
cache can have an effective powered size much smaller 
than a cache of equal miss-rate. Alternatively, a decay 
cache with the effective powered size of a small cache 
performs better. 
20 [0022] In addition, the full performance of the decay 
cache is available to demanding applications when pow- 
er consumption is not an issue. This flexibility of the de- 
cay cache is particularly useful in battery-powered com- 
puters. The cache decay techniques of the present in- 
25 vention can be successfully applied to both data and in- 
struction caches. With the increasing importance of 
leakage power in upcoming generations of CPUs, and 
the increasing size of on-chip memory, cache decay can 
be a useful architectural tool to reduce leakage power 
30 consumption. 

[0023] As previously indicated, M.D. Powell et al. dis- 
close a technique for powering down sections of the in- 
struction cache and resizing the cache that significantly 
reduces leakage power. The present invention removes 
35 power from parts of the data cache with a finer granu- 
larity (cache-line granularity) and without resizing. The 
present invention relies on the fact that many cache 
frames are under-utilized and therefore can be turned 
off without impacting performance. Generally, the 
40 present invention attempts to power down the cache 
lines that were not accessed for the specified decay in- 
terval assuming that these will be unlikely to be ac- 
cessed in the future. 

[0024] Since most consecutive accesses to the same 
45 cache-line are spaced closely in time (temporal locality), 
a cache line that has not been accessed for some time 
either will not be accessed again or it is one of the few 
cache lines that will be accessed very far into the future. 
Therefore, the present invention maintains power to 
50 cache lines as long as they are accessed within the de- 
cay interval. As discussed further below, the passage of 
the decay interval from the last access to each cache 
line may be detected using a digital or analog implemen- 
tation. 

55 [0025] The present invention recognizes that cache- 
line decay will increase the miss rate of the cache. In 
other words, a few lines will be powered-off in accord- 
ance with the present invention before they are ac- 
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cessed. However, it can be shown that the miss rate of 
a decay cache is still less than a smaller cache whose 
size matches the average powered size of the decay 
cache. Another way to view the decay cache is from a 
leakage power efficiency perspective, where the aver- 5 
age powered size of a decay cache is smaller than a 
cache of equal miss rate. 

Digital Implementation 

10 

[0026] The recency of a cache line access can be rep- 
resented via a digital counter that is cleared on each 
access to the cache line and incremented periodically 
at fixed time intervals. Once the counter reaches a spec- 
ified count, the counter saturates and removes the pow- 15 
er (or ground) to the corresponding cache line. 
[0027] It has been observed that decay intervals tend 
to be on the order of tens or hundreds of thousands of 
cycles. The number of cycles needed for a reasonable 
decay interval thus makes it impractical for the counters 20 
to count cycles (too many counter bits would be re- 
quired). Thus, the number of required bits can be re- 
duced by "ticking" the counters at a much coarser level, 
for example, every few thousand cycles. A global cycle 
counter can be utilized to provide the ticks for smaller 25 
cache-line counters. Simulations have shown that a 
two-bit counter per cache line provides sufficient reso- 
lution with four quantized counter levels. For example, 
if a cache line should be powered down 10,000 clock 
cycles following the most recent access, each of the four 30 
quantized counter levels corresponds to 2,500 cycles. 
[0028] FIG. 3 illustrates a digital implementation of a 
cache memory 300 in accordance with the present in- 
vention. As shown in FIG. 3, the cache memory 300 in- 
cludes a two-bit saturating counter 320-n (hereinafter, 35 
collectively referred to as counters 320) associated with 
each cache line, and an N-bit global counter 310. In ad- 
dition, each cache line includes a tag indicating which 
particular block of main memory 130 is currently stored 
in the cache line and a valid bit indicating whether the 40 
stored data is valid. To save power, the global counter 
310 can be implemented, e.g., as a binary ripple coun- 
ter. An additional latch (not shown) holds a maximum 
count value that is compared to the global counter 31 0. 
When the global counter 31 0 reaches the maximum val- 45 
ue, the global counter 31 0 is reset and a one-clock-cycle 
T signal is generated on a global time signal distribution 
line 330. The maximum count latch (not shown) is non- 
switching and does not contribute to dynamic power dis- 
sipation. In general and on average using small cache 50 
line counters, very few bits are expected to switch per 
cycle. 

[0029] To minimize state transitions in the counters 
320 and thus minimize dynamic power consumption, the 
exemplary digital implementation of the present inven- 55 
tion uses Gray coding so that only one bit changes state 
at any time. Furthermore, to simplify the counters 320 
and minimize the transistor count, the counters 320 are 



implemented asynchronously. In a further variation, the 
counters 310, 320 can be implemented as shift regis- 
ters. FIG. 4 illustrates the circuitry 400 required for each 
cache line in the exemplary digital implementation. Each 
cache line contains circuitry to implement the state ma- 
chine depicted in FIG. 5. 

[0030] FIG. 5 provides the state diagram 500 for ex- 
emplary two-bit (SO, S1), saturating, Gray-code 
counters 320 with two inputs (WRD and T). T is the glo- 
bal time signal generated by the (synchronous) global 
counter 31 0 to indicate the passage of time. T is a well- 
behaved digital signal whose period may be adjusted 
externally to provide different decay intervals appropri- 
ate for different programs. The second state machine 
input is the cache line access signal, WRD, which is de- 
coded from the address and is the same signal used to 
select a particular row within the cache memory 300 (e. 
g., the WORD-LINE signal). As shown in FIG. 5, state 
transitions occur asynchronously on changes of the two 
input signals, T and WRD. Since T and WRD are well- 
behaved signals, there are no meta-stability problems. 
The only output is the cache-line switch state, Pow- 
erOFF (POOFF). The cache line is reset and returns to 
state 00 each time the cache line is accessed. 
[0031 ] When power to a cache line is turned off (state 
10), the cache decay should disconnect the data and 
corresponding tag fields associated with the cache line 
from power supply. Removing power from a cache line 
has important implications for the rest of the cache cir- 
cuitry. In particular, the first access to a powered-off 
cache line should: 

1 . result in a cache miss (since data and tag might 
be corrupted without power); 

2. reset the corresponding counter 320-i and re- 
store power to the cache line (i.e., restart the decay 
mechanism as per FIG. 4); and 

3. be delayed for a period of time until the cache- 
line circuits stabilize after power is restored (the in- 
herent access time to main memory should be a suf- 
ficient delay in many situations). 

[0032] To satisfy these requirements, the present in- 
vention employs the Valid bit of the cache line as part of 
the decay mechanism, as shown in FIG. 6. FIG. 6 illus- 
trates the cache-line power control in accordance with 
the present invention. First, the circuitry shown in FIG. 
6 ensures that the valid bit is always powered (as is the 
counter). Second, the circuitry shown in FIG. 6 provides 
a reset capability to the valid bit so it can be reset to 0 
(invalid) by the decay mechanism. The PowerOFF sig- 
nal clears the valid bit. Thus, the first access to a pow- 
ered-off cache line always results in a miss regardless 
of the contents of the tag. Since satisfying this miss from 
the lower memory hierarchy is the only way to restore 
the valid bit, a newly powered cache line will have 
enough time to stabilize. In addition, no other access (to 
this cache line) can read the possibly corrupted data in 
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the interim. 

[0033] The digital implementation can be controlled at 
run-time by the operating system (OS) or by any other 
hardware run-time monitoring system via the global cy- 
cle counter 310. The OS or other system can set the 
period (T period ) of the global counter 31 0 to produce the 
desired decay interval according to the demands of the 
executing application and the power-consumption re- 
quirements of the computer system. Profiling and/or 
run-time monitoring can be used to adjust decay inter- 
vals. Adaptive approaches may also be employed, 
where the decay interval is adjusted individually for each 
cache line. By monitoring the extraneous decay misses 
of each cache line, the decay interval can be adjusted 
to avoid repeating mistakes. Thus, good performance is 
possible without the need to set a decay interval on a 
per-application basis. 

Analog Implementation 

[0034] The recency of a cache line access can alter- 
natively be implemented using an event, such as the 
charging or discharging of a capacitor 71 0, as shown in 
FIG. 7. Thus, each time a cache line is accessed, the 
capacitor is grounded. In the common case of a fre- 
quently accessed cache-line, the capacitor will be dis- 
charged. Over time, the capacitor is charged through a 
resistor 720 connected to V dd . Once the charge reaches 
a sufficiently high level, a voltage comparator 730 de- 
tects the charge, asserts the PowerOFF signal and dis- 
connects the power supply from the corresponding 
cache line (data bits and tag bits). 
[0035] Although the RC time constant cannot be 
changed (it is determined by the fabricated size of the 
capacitor and resistor), the bias of the voltage compa- 
rator 630 can be adjusted to accommodate the temporal 
access patterns of different programs. An analog imple- 
mentation is inherently noise sensitive and process 
technology sensitive and can change state asynchro- 
nously with the remainder of the digital circuitry. A meth- 
od of synchronously sampling the voltage comparator 
can be employed to avoid meta-stability. 
[0036] It is noted that an analog implementation of the 
present invention is particularly well-suited for dynamic 
random access memory (DRAM) technologies where 
data is stored as a charge on a capacitor, as discussed 
hereinafter. In a DRAM implementation, the stored data 
charge will decay naturally and must be refreshed. The 
validity of the data is ensured by a valid bit that decays 
faster than the corresponding stored data charge. Thus, 
the decay of the valid bit can correspond to the decay 
counter in accordance with the present invention. 

Static and Dynamic Random Access Memory Hybrid 

[0037] The cache decay techniques of the present in- 
vention can be applied to hybrid memory technologies 
incorporating both static and dynamic memory architec- 



tures, as would be apparent to a person of ordinary skill 
in the art based on the disclosure herein. The application 
of the present invention to a hybrid memory is illustrated 
using a decay cache that employs 4-transistor memory 

5 cells without a path to ground. 

[0038] When conventional SRAM circuits are used for 
the design of caches, in deep-submicron technologies, 
static power dissipation is an obvious concern . Attempts 
to reduce this power dissipation in applications with 

10 SRAM based cache memory, such as those described 
in M.D. Powell et al., M Gated-V dd : A Circuit Technique to 
Reduce Leakage in Deep-Submicron Cache Memo- 
ries," ACM/IEEE International Symposium on Low Pow- 
er Electronics and Design (ISLPED) (2000), employ cir- 

15 cuit design techniques that gate (or cutoff) the current 
path (from supply voltage) to ground of individual cache 
lines. These techniques are effective in cutting off some 
of the current flow to ground, however, they have an ar- 
ea-performance tradeoff resulting from the additional 

20 peripheral circuitry required to gate the current path to 
each cache line. 

[0039] One approach that is not seen in the literature, 
but appears to have a clear power advantage in cache 
design, is the use of a multi-transistor DRAM cell, such 

25 as those described in K.Stein, etal., "Storage Array and 
Sense / Refresh Circuit for Single Transistor Memory 
Cells", IEEE Journal of Solid State Circuits, Vol. SC-7, 
No. 5, 336-340 (October 1 972). This class of DRAM cell, 
in particular DRAM cells that employ four transistors per 

30 cell (4T-DRAM) have no path (from supply voltage) to 
ground and therefore require no additional circuitry to 
gate the current path to ground. The cell size of a 
4T-DRAM is smaller than a SRAM cell because they on- 
ly require four transistors. 4T-DRAM cells without a path 

35 to ground also have an electronic charge decay that 
somewhat mimics the data decay of DRAM memory 
cells, discussed above. 4T-DRAM cells naturally decay 
over time without the need to "switch" them off, as with 
SRAM memory cells that are connected to supply volt- 

40 age and ground. In that respect, they behave as DRAM 
cells. 

[0040] Thus, according to another aspect of the 
present invention, cache decay mechanisms are dis- 
closed for 4T-DRAM caches similar to the SRAM decay 

45 mechanisms described herein. The decay mechanism 
is used to indicate when the values of the memory cells 
become unreliable because of their intrinsic decay (rath- 
er than to "switch-off" the memory cells of the cache line 
since this is unnecessary in this design). 

50 [0041] The 4T-DRAM cells of a cache line are 
charged to hold logical values (0 and 1 ) when the cache 
line is written. These memory cells decay over time los- 
ing their charge. Each time the cells are accessed, they 
are re-charged automatically to the correct logical val- 

55 ues. However, if a long period of time elapses without 
any access, the memory cells will decay to the point 
where they become unreliable and it is not possible to 
recover the correct logical values without significant er- 
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rors. A safe period can be established in which ac- 
cessed cells are still reliable. The exact value of this pe- 
riod (on the order of milliseconds) depends on charac- 
teristics of the integrated circuit (IC) process used to 
manufacture the memory cells. 

[0042] The purpose of a decay mechanism in this 
case is to signal the end of the safe period after the last 
access. The maximum decay interval in this case is 
equal to the safe period. Similar to SRAM caches, the 
Valid bit can be used to indicate that the cache line is 
considered invalid. The Valid bit should not decay with 
the rest of the memory cells and should always be reli- 
able. The Valid bit can be manufactured as a standard 
6-transitor SRAM cell, or be refreshed periodically to en- 
sure its reliability. A decay counter per cache line (im- 
plemented as a finite state machine as described in the 
SRAM cache decay above) counts time since the last 
access to the cache line. At the end of the safe period, 
the decay counter resets the Valid bit to indicate that the 
data is invalid. 

[0043] Because the memory cells in a 4T-DRAM dis- 
charge with a specific rate (as do the DRAM memory 
cells) there is no benefit in "decaying" and invalidating 
a cache line any sooner than the end of the safe period. 
Because the safe period is large (on the order of milli- 
seconds which translates to hundreds of thousands of 
machine cycles for processor clocks in the hundreds of 
MHz), the local cache-line decay counters can be very 
coarse-grained (i.e., with very low resolution). In this 
case, single-bit local cache-line decay counters can be 
used. The global counter ticks at a period half of the safe 
period. The single-bit local counter implements the finite 
state machine shown in FIG. 10. Since the last access 
to a cache line in relation to the next global tick pulse is 
unknown, decay intervals range from half safe period to 
a full safe period. On average, the decay interval of the 
1 -bit decay counter is 3 A of the safe period. 
[0044] FIG. 9 illustrates an exemplary 4T-DRAM 
memory cell (bit) containing four transistors. WRD is the 
word line used when accessing the cell. B and B-BAR 
are the bit lines that read or write the value of the cell. 
The 4T-DRAM memory cell directly replaces ordinary 
6-transistor SRAM memory cells. FIG. 10 illustrates a 
state diagram for a one-bit local cache-line counter used 
to decay the 4T-DRAM cache lines of FIG. 9. 

Write-Back Cache Implementation 

[0045] A processor that employs a write-back cache 
architecture can update the cache with a new value with- 
out updating the corresponding location(s) of main 
memory. Thus, if the decay interval for a cache line has 
been reached, the cache line cannot be powered-down 
in accordance with the present invention until the mod- 
ified data has been written to the appropriate location of 
main memory. Thus, as shown in FIG. 8, in addition to 
having fields for the Valid bit, tag and data, each cache 
line of a cache 800 optionally maintains a "dirty bit" iden- 



tifier indicating if the value stored in a given cache line 
needs to be written back to the appropriate location of 
main memory, as identified by the tag. The dirty bit is 
set by the processor each time the cache is updated with 
5 a new value without updating the corresponding location 
(s) of main memory. 

[0046] In a further variation, the global time signal, T, 
supplied by the global cycle counter 31 0 can cascaded 
from one cache line counter to another (or from one 

10 group of counters to another), to distribute the "writing 
back" of "dirty" cache lines in a write-back cache imple- 
mentation. Thus, a given cache line (or group of cache 
lines) writes back to main memory, if necessary, before 
passing the global time signal, T, to the next cache line 

15 (or group of cache lines). In this manner, the cascading 
of the distribution of the global time signal, T, spreads 
out the decay associated with a single count from the 
global counter 310. 

[0047] The additional dynamic power dissipated due 
20 to the decay circuitry of the present invention is propor- 
tional to the product of its load capacitance and the tran- 
sistor switching activity. For the digital implementation 
described herein, less than 1 1 0 transistors switch on av- 
erage every cycle. The entire decay circuitry involves a 
25 very small number of transistors: a few hundred for the 
global counter plus under 30 transistors per local cache 
line counter. All local counters change value with every 
T pulse. However, this happens at very coarse intervals 
(equal to the period of the global counter). Resetting a 
30 local counter with an access to a cache line is not a 
cause of concern either. If the cache line is heavily ac- 
cessed, the counter has no opportunity to change from 
its initial value so resetting it does not expend any dy- 
namic power (none of the counter's transistors switch). 
35 The cases where power is consumed are accesses to 
cache lines that have been idle for at least one period 
of the global counter. It has been observed that over all 
the 2-bit counters used by the present invention, there 
is less than one bit transition percycleon average. Thus, 
40 the dynamic power dissipation of the decay circuitry is 
negligible compared to the dynamic power dissipated in 
the remainder of the chip, which presumably contains 
millions of transistors. 

45 Cache Decay in Multilevel Cache Hierarchies 

[0048] Many systems employ multilevel cache hierar- 
chies consisting of a relatively small and fast level-one 
(L1 ) cache and one or more levels of increasingly larger 
50 and slower caches, level-two (L2), level-three (L3), ... 
level-N (LN). Typically, larger caches that are lower in 
the hierarchy would see far fewer accesses that are far- 
ther apart (since the processor's stream of accesses is 
filtered by hits in upper levels). Decay intervals for each 
55 level should be sized accordingly (i.e., larger decay in- 
tervals for larger low-level caches). Decay intervals can 
be sized by the appropriate selection of the tick signal 
(T) period. 
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[0049] Multilevel inclusion in a cache hierarchy en- 
sures that the contents of a higher level cache (e.g., L1 ) 
are a subset of the contents of its immediate lower level 
cache (e.g., L2). Multilevel inclusion is enforced in many 
designs. In cache hierarchies that do not enforce multi- 
level inclusion, cache lines can be independently de- 
cayed in the different level caches. For example, data 
in L2 can be decayed independently of their existence 
in L1. In cache hierarchies where multilevel inclusion is 
enforced, decay mechanisms must be modified to pre- 
serve inclusion. 

[0050] Typically, in a cache hierarchy where multilevel 
inclusion is enforced, an inclusion bit in every cache line 
(in all levels except L1), indicates whether the cache- 
line data exist in upper levels. In the case of different 
size cache lines among hierarchy levels, multiple inclu- 
sion bits may be used to indicate the exact parts of a 
cache line that exist in upper levels. If, for example, the 
L1 line size is 32 bytes and the L2 line size is 1 28 bytes, 
4 inclusion bits in the L2 cache lines are used to accu- 
rately reflect data inclusion. The decay mechanism of 
the present invention is modified so that when the inclu- 
sion bit indicates the existence of the same data in upper 
levels, only the data field is allowed to decay but not the 
tag field. The tag field remains powered-on, thus pre- 
serving a placeholder for the data that exist in the upper 
levels. Multilevel inclusion is preserved for the tags only. 
Decay of a cache line that does not exist in higher levels, 
proceeds normally. Access to a cache line with only the 
tag powered-on results in a miss. Another solution that 
preserves multilevel inclusion would be to allow the de- 
cay of a cache line only if it was also decayed in upper 
levels. Decay of a cache line that does not exist in higher 
levels, proceeds normally. 

[0051] It is to be understood that the embodiments 
and variations shown and described herein are merely 
illustrative of the principles of this invention and that var- 
ious modifications may be implemented by those skilled 
in the art without departing from the scope and spirit of 
the invention. 



Claims 

1. A cache memory, comprising: 

a plurality of cache lines for storing a value from 
main memory; and 

a timer associated with each of said plurality of 
cache lines, each of said timers configured to 
control a signal that removes power to said as- 
sociated cache line after a decay interval. 

2. The cache memory of claim 1 , wherein a timer as- 
sociated with a given cache line is reset each time 
said associated cache line is accessed. 

3. The cache memory of claim 1 or claim 2, further 



comprising a dirty bit associated with each of said 
cache lines to indicate when a contents of said 
cache line must be written back to main memory 
before said power is removed from said associated 
5 cache line after said decay interval. 

4. The cache memory of claim 3, wherein one or more 
of said timers associated with said plurality of cache 
lines are cascaded to distribute said writing back to 

10 main memory over time. 

5. The cache memory of any preceding claim, wherein 
a first access to a cache line that has been powered 
down results in a cache miss, resets said corre- 

15 sponding timer and restores power to said cache 
line. 

6. The cache memory of any preceding claim, wherein 
a first access to a cache line that has been powered 

20 down is delayed by a period of time that permits said 
cache line to stabilize after power is restored. 

7. The cache memory of any preceding claim, wherein 
said cache memory is a multilevel cache memory, 

25 and further comprising one or more inclusion bits 
associated with each of said cache lines, said inclu- 
sion bits indicating whether data stored in said 
cache line exists in upper levels and wherein said 
power is removed only from a data field associated 

30 with said cache line after said decay interval when 
said inclusion bits indicate the existence of the 
same data in an upper level. 

8. The cache memory of any preceding claim, wherein 
35 said cache memory is a multilevel cache memory, 

and wherein said power is removed from a cache 
line after said decay interval only if said power is 
removed in corresponding upper levels of said mul- 
tilevel cache memory. 

40 

9. A method for reducing leakage power in a cache 
memory, said cache memory having a plurality of 
cache lines, said method comprising the steps of: 

45 providing a timer for each of said cache lines; 

resetting said timer each time said cache line 
is accessed; and 

removing power from said associated cache 
line after a decay interval. 

50 

10. The method of claim 9, further comprising the step 
of evaluating a dirty bit associated with each of said 
cache lines that indicates when a contents of said 
cache line must be written back to main memory 

55 before said power is removed from said associated 
cache line after said decay interval. 

11. The method of claim 9 or claim 10, wherein said 
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step of removing power from said associated cache 
line further comprises the step of resetting a valid 
field associated with said cache line. 

12. The method of any one of claims 9 to 11 , wherein a 5 
first access to a cache line that has been powered 
down further comprises the steps of establishing a 
cache miss, resetting said corresponding timer and 
restoring power to said cache line. 

10 

13. The method of any one of claims 9 to 12, wherein 
a first access to a cache line that has been powered 
down further comprises the step of delaying said ac- 
cess by an appropriate amount of time until said 
cache line stabilizes after power is restored. 15 

14. A cache memory, comprising: 

a plurality of cache lines for storing a value from 
main memory, each of said cache lines com- 20 
prised of one or more dynamic random access 
memory (DRAM) cells, each of said DRAM 
cells being refreshed each time said cache line 
is accessed, each of said DRAM cells reliably 
storing said value for a safe period; and 25 
a timer associated with each of said plurality of 
cache lines, each of said timers controlling a 
signal that resets a valid bit associated with 
said cache line after said safe period. 

30 

15. The cache memory of claim 14, further comprising 
a dirty bit associated with each of said cache lines 
to indicate when a contents of said cache line must 
be written back to main memory before said power 

is removed from said associated cache line after 35 
said decay interval. 

16. A method for reducing leakage power in a cache 
memory, said cache memory having a plurality of 
cache lines, for storing a value from main memory, 40 
each of said cache lines comprised of one or more 
dynamic random access memory (DRAM) cells, 
each of said DRAM cells reliably storing said value 

for a safe period, said method comprising the steps 

of: 45 

refreshing each of said DRAM cells each time 
said corresponding cache line is accessed; and 
providing a timer for each of said cache lines; 
resetting said timer each time said cache line 50 
is accessed; and 

resetting a valid bit associated with said cache 
line after said safe period. 



9 



EP 1 202 287 A2 

FIG. 1 

PRIOR ART 



^110 


r 120 


^130 


PROCESSOR 




CACHE 




MAIN MEMORY 







FIG. 2 

PRIOR ART S 



CACHE LINE 
NUMBER 


TAG 


DATA 


VALID 

bit 


0 








1 








2 
















C-1 









K WORDS 



FIG. 3 

300 




TICK PULSE (T) 
330 — - 



310 



VALID BITS 

m — m~~ 



2-BIT DECAY / 
COUNTERS 
320-n \ " * 



] LTT 



CACHE ARRAY 



CACHE-LINE DATA/TAG 



CACHE-LINE DATA/TAG 



CACHE-LINE DATA/TAG 



CACHE-LINE DATA/TAG 



10 



EP 1 202 287 A2 



FIG. 4 



400 





P00FF 



FIG. 5 



WRD SIGNAL (ACCESS) 



500 




NEXT S Q = S 0 • T • WRD + S ] • T • WRD 
NEXT Sj = T • WRD + S r S 0 -WRD 



11 



EP 1 202 287 A2 




12 



EP 1 202 287 A2 



FIG. 7 



Vdd 

VOLTAGE 
COMPARATOR 




CACHE LINE 
NUMBER 


TAG 


DATA 


VALID 
BIT 


DIRTY 
BIT 


0 










1 










2 












• 


• 






C-1 











K WORDS 



13 



EP 1 202 287 A2 



FIG. 9 



WRD 



r i 




T/POWER OFF 



14 



